Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams
نویسنده
چکیده
This paper proposes new techniques for efficiently parallelizing sliding window processing over data streams on a shared-nothing cluster of commodity hardware. Data streams are first partitioned on the fly via a continuous split stage that takes the query semantics into account in a way that respects the natural chunking (windowing) of the stream by the query. The split does not scale well enough when there is high degree of overlap across the windows. To remedy this problem, we propose two alternative partitioning strategies based on batching and pane-based processing, respectively. Lastly, we provide a continuous merge stage at the end that combines the results on the fly while meeting QoS requirements on ordered delivery. We implemented these techniques as part of the Borealis distributed stream processing system, and conducted experiments that show the scalability of our techniques based on the Linear Road Benchmark.
منابع مشابه
Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملIvanova Scalable Scientific Stream Query Processing
Ivanova, M. 2005. Scalable Scientific Stream Query Processing. Acta Universitatis Upsaliensis. Uppsala Dissertations from the Faculty of Science and Technology 66. 137 pp. Uppsala. ISBN 91-554-6351-7 Scientific applications require processing of high-volume on-line streams of numerical data from instruments and simulations. In order to extract information and detect interesting patterns in thes...
متن کاملMining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding Window Techniques
As we know that the online mining of streaming data is one of the most important issues in data mining. In this paper, we proposed an efficient one.frequent item sets over a transaction-sensitive sliding window), to mine the set of all frequent item sets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorit...
متن کاملCustomizable Parallel Execution of Scientific Stream Queries
Scientific applications require processing highvolume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are desc...
متن کاملارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کامل